PAC Reinforcement Learning with an Imperfect Model
نویسنده
چکیده
Reinforcement learning (RL) methods have proved to be successful in many simulated environments. The common approaches, however, are often too sample intensive to be applied directly in the real world. A promising approach to addressing this issue is to train an RL agent in a simulator and transfer the solution to the real environment. When a highfidelity simulator is available we would expect significant reduction in the amount of real trajectories needed for learning. In this work we aim at better understanding the theoretical nature of this approach. We start with a perhaps surprising result that, even if the approximate model (e.g., a simulator) only differs from the real environment in a single state-action pair (but which one is unknown), such a model could be information-theoretically useless and the sample complexity (in terms of real trajectories) still scales with the total number of states in the worst case. We investigate the hard instances and come up with natural conditions that avoid the pathological situations. We then propose two conceptually simple algorithms that enjoy polynomial sample complexity guarantees with no dependence on the size of the state-action space, and prove some foundational results to provide insights into this important problem.
منابع مشابه
Expected Mistake Bound Model for On-Line Reinforcement Learning
We propose a model of eecient on-line reinforcement learning based on the expected mistake bound framework introduced by Haussler, Littlestone and Warmuth (1987). The measure of performance we use is the expected diierence between the total reward received by the learning agent and that received by an agent behaving optimally from the start. We call this expected diierence the cumulative mistak...
متن کاملProbably Approximately Correct (PAC) Exploration in Reinforcement Learning
OF THE DISSERTATION Probably Approximately Correct (PAC) Exploration in Reinforcement Learning by Alexander L. Strehl Dissertation Director: Michael Littman Reinforcement Learning (RL) in finite state and action Markov Decision Processes is studied with an emphasis on the well-studied exploration problem. We provide a general RL framework that applies to all results in this thesis and to other ...
متن کاملPAC-Bayesian Policy Evaluation for Reinforcement Learning
Bayesian priors offer a compact yet general means of incorporating domain knowledge into many learning tasks. The correctness of the Bayesian analysis and inference, however, largely depends on accuracy and correctness of these priors. PAC-Bayesian methods overcome this problem by providing bounds that hold regardless of the correctness of the prior distribution. This paper introduces the first...
متن کاملReinforcement Learning in Finite MDPs: PAC Analysis Reinforcement Learning in Finite MDPs: PAC Analysis
We study the problem of learning near-optimal behavior in finite Markov Decision Processes (MDPs) with a polynomial number of samples. These “PAC-MDP” algorithms include the well-known E and R-MAX algorithms as well as the more recent Delayed Q-learning algorithm. We summarize the current state-of-the-art by presenting bounds for the problem in a unified theoretical framework. We also present a...
متن کاملSample Efficient Reinforcement Learning with Gaussian Processes
This paper derives sample complexity results for using Gaussian Processes (GPs) in both modelbased and model-free reinforcement learning (RL). We show that GPs are KWIK learnable, proving for the first time that a model-based RL approach using GPs, GP-Rmax, is sample efficient (PAC-MDP). However, we then show that previous approaches to model-free RL using GPs take an exponential number of step...
متن کامل